65 research outputs found
Toward a generic representation of random variables for machine learning
This paper presents a pre-processing and a distance which improve the
performance of machine learning algorithms working on independent and
identically distributed stochastic processes. We introduce a novel
non-parametric approach to represent random variables which splits apart
dependency and distribution without losing any information. We also propound an
associated metric leveraging this representation and its statistical estimate.
Besides experiments on synthetic datasets, the benefits of our contribution is
illustrated through the example of clustering financial time series, for
instance prices from the credit default swaps market. Results are available on
the website www.datagrapple.com and an IPython Notebook tutorial is available
at www.datagrapple.com/Tech for reproducible research.Comment: submitted to Pattern Recognition Letter
CorrGAN: Sampling Realistic Financial Correlation Matrices Using Generative Adversarial Networks
We propose a novel approach for sampling realistic financial correlation
matrices. This approach is based on generative adversarial networks.
Experiments demonstrate that generative adversarial networks are able to
recover most of the known stylized facts about empirical correlation matrices
estimated on asset returns. This is the first time such results are documented
in the literature. Practical financial applications range from trading
strategies enhancement to risk and portfolio stress testing. Such generative
models can also help ground empirical finance deeper into science by allowing
for falsifiability of statements and more objective comparison of empirical
methods
A proposal of a methodological framework with experimental guidelines to investigate clustering stability on financial time series
We present in this paper an empirical framework motivated by the practitioner
point of view on stability. The goal is to both assess clustering validity and
yield market insights by providing through the data perturbations we propose a
multi-view of the assets' clustering behaviour. The perturbation framework is
illustrated on an extensive credit default swap time series database available
online at www.datagrapple.com.Comment: Accepted at ICMLA 201
Clustering Financial Time Series: How Long is Enough?
Researchers have used from 30 days to several years of daily returns as
source data for clustering financial time series based on their correlations.
This paper sets up a statistical framework to study the validity of such
practices. We first show that clustering correlated random variables from their
observed values is statistically consistent. Then, we also give a first
empirical answer to the much debated question: How long should the time series
be? If too short, the clusters found can be spurious; if too long, dynamics can
be smoothed out.Comment: Accepted at IJCAI 201
- …